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Abstract 

We generalize the maximum likelihood method to non-Gaussian distribution 
functions by means of the multivariate Edgeworth expansion. We stress the 
potential interest of this technique in all those cosmological problems in which 
the determination of a non-Gaussian signature is relevant, e.g. in the analysis of 
large scale structure and cosmic microwave background. A first important result 
is that the asymptotic confidence limits on the parameters are systematically 
widened when the higher order correlation functions are positive, with respect 
to the Gaussian case. 
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1 Introduction 



Modern large scale astronomy is, to a large extent, the science of non-Gaussian random 
fields. One of the keys to understanding the formation and evolution of structure in the 
Universe resides infact in the statistical properties of the matter field. Rival theories of 
structure formation predicts different statistical features, both in the present Universe 
and in the primordial fluctuations encoded in the microwave background. To the 
scope of quantifying the statistical feature of the matter clustering, several techniques 
have been proposed. One of these, which is increasingly popular in astrophysics, is 
the estimation of parameters via the maximum likelihood method. For instance, the 
maximum likelihood method is currently widely employed in the analysis of cosmic 
microwave background (CMB) experiments, large scale surveys and cosmic velocity 
fields. Once a suitable likelihood function (LF) has been constructed, one estimates 
the best parameters simply by finding the maximum of the LF with respect to those 
parameters. The parameter estimates, say ai{xi) (the hat is to distinguish between the 
estimates and the theoretically expected parameters), are then functions of the data 
Xi, i.e. of the random variables, and are therefore random variables themselves. If one 
is able to determine the distribution function P{ai) of the estimators, the confidence 
region (CR) of the parameters can be found as the value of the parameters for which 
the integral of P{ai) falls below a predetermined level. 

There are however two problems with this approach. One is that usually we don't 
know how the data are distributed, and the usual Gaussian approximation may be very 
poor. This is the case, for instance, in large scale structure, where we already know 
that the density fluctuations are not Gaussian, even on fairly large scales. The second 
problem is that, even if wc know perfectly well the data distribution, is often not trivial 
to find an analytical expression for the distribution P{ai) of the parameter estimators 
ai. Aside the simplest case in which one only needs the raw sample variance or the 
sample mean of normal variates (or closely related quantities), one has invariably to 
resort to very time-consuming MonteCarlo methods. While this second problem can be 
always overcome by numerical methods, the first difficulty remains, unless one takes 
into consideration specifically designed non-Gaussian models, and for each of these 
determines the confidence regions for the relevant parameters. Other than being too 
model-dependent, in the current astrophysical applications this procedure is in many 
cases prohibitively slow. It is then of interest to examine alternatives able to retain 
the useful features of the likelihood method while allowing more freedom in exploring 
different non-Gaussian (non-G, for shortness) distributions. 

In this work we propose a perturbative method to estimate theoretical parameters 
when the higher-order multivariate moments (or n-points correlation functions) are 
non- vanishing, via an expansion around a Gaiissian LF, the multivariate Edgcworth 
expansion (MEE). As long as the perturbative approach docs not break down, i.e. as 
long as the departure from Gaussianity is mild, the MEE gives an answer to the first 
problem, the distribution function of the data, because it allows arbitrary values of 
the higher-order correlation functions. Then we still are left with the second problem: 
how do we determine in the general case the distribution function for our parameter 
estimators, necessary to produce the confidence regions? A first simple possibility is 
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to approximate the P{ai) around its peak, previously determined by maximization of 
the LF, by a Gaussian distribution, multivariate in the parameter space. This allows 
to determine an approximate covariance matrix, whose eigenvalues give the principal 
axis of the parameter CR (sec e.g. Kendall, Stuart & Ord 1987). Notice that this is 
equivalent to assume the parameter estimators, which are functions of all the dataset, 
as Gaussian distributed, but makes no assumptions on the data themselves. When the 
number of data is large, this procedure can be justified by the central limit theorem 
(which however cannot guarantee the asymptotic Gaussianity in the general case). 
This first possibility, along with its limitations, is discussed in Sect. 3. To overcome 
the limits relative to the Gaussian approximation of the estimator distribution, and 
exploiting the analytic properties of the MEE, we adopt in Sect. 4 and Sect. 5 a second, 
exact, way to determine the CR for some of the relevant parameters. This is a non- 
Gaussian generalization of the technique: instead of finding the CR by integration 
over the unknown distribution function of the sample estimators, we determine the CR 
by integrating the LF over the possible outcomes of an experiment. As in the usual 
method, the acceptable values of a parameter will be all those for which the data lie not 
too distant from the predicted values, the "distance" being measured by the quantity 
Xo = XiX^Xj, where Xi are the actual data and A*-' is the inverse of the correlation 
matrix. The CR will in general depend on all the higher-order correlation functions 
included in the MEE, as it will be shown in Sect. 4. The formalism is then best suited 
to answer the question: how our results (i.e., best estimates and CR) change when 
the higher-order moments of the data distribution are not set to zero? If we have any 
reason to believe in some particular values for the non-G moments, then we can plot 
the CR for our parameters given those higher-order moments, and clearly the regions 
will be different for any set of higher-order moments. The relevance of examining how 
the confidence regions vary with respect to the non-G parameters is clear. Suppose 
in fact that two experiments, assuming Gaussianity, produce two non-overlapping CR 
for the same parameter, say the overall normalization of the correlation function. In 
general, the CR will be different in the non-Gaussian case, and it may happen that the 
two experiments are infact compatible when some level of non-Gaussianity is assumed. 
As we will show, in most cases the CR widens for positive higher-order moments, so 
that two non-overlapping results can be brought to agreement, provided some amount of 
non-Gaussianity is allowed. Other positive features of our formalism are that it exploits 
the full set of data, that it can be extended to higher and higher order moments, and 
that it is fully analytical. 

To the order to which we limit ourselves here, we will be able to estimate the first 
non-G correlation function, i.e. the third-order moments. This estimate will share the 
good and less good properties of likelihood estimators: they are consistent estimators, 
but only asymptotically (i.e., for large samples) unbiassed, as we will show in Sect. 3. 
For the fourth-order cumulant there is not an estimator at all, since the LF is linear 
in it. We decided to keep track of it anyway, because it is still interesting to use the 
fourth-order cumulant as an external parameter, and see how our results change with 
different assumptions on it. 

In principle one can include in the analysis all the set of higher-order moments 
considered relevant to the problem, but here we will limit ourselves to the first two 
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higher-order terms, the 3- and 4-points correlation functions. For most purposes, this 
is the best we can do for comparing different models with observations, since current 
data do not permit accurate analysis of correlation functions of order higher than the 
fourth one. 

Let us remark that we call here likelihood function the probability distribution 
function f{xi, ai) of the data Xi (our random variables) defined in a sample space S, 
given some theoretical parameter ai, which can be thought to lie in the parameter space 
P. Essentially, for any point in the parameter space, i.e. for any distribution function, 
we will integrate the LF over the sample space S, i.e. over all the possible outcomes 
of the experiment, to determine how likely or unlikely is the possibility that the actual 
data set has arisen from such a parametric choice. For a discussion of the advantage 
and disadvantage of this approach with respect to the alternative Bayesan one, in 
which the integration occurs over the space of the theoretical parameters (as opposed 
to the sample estimators of the theoretical parameters considered in the frequentist 
approach), we refer to standard textbooks hke Kendall, Stuart & Ord (1987). 

Beside presenting the basic formalism of the non-Gaussian LF, we discuss briefly 
in Sect. 6 its application to large scale structure and to the CMB. In the first case the 
non-Gaussian nature of the galaxy distribution is a well-established fact, so that the 
use of a non-G LF is certainly required. In the case of CMB, the current set of data 
is still not accurate enough to assess the issue. The estimate of a confidence region in 
non-G models is however crucial in view of the discrimination among different theories 
of structure formation. 



2 Formalism 

Let be a set of experimental data, i — 1, ..N, and let us form the variables — (P—f, 
where f are the theoretical expected values for the measured quantities. To fix the 
ideas, one can think of as the temperature fiuctuation in the i-th pixel in a CMB 
experiment, or as the number of galaxies in a given volume of the Universe. Let be 
the correlation matrix 

c'^ =< x'x^ > , (1) 

and let us introduce the higher-order cumulant matrices (or n-point correlation func- 
tions) 

k''^'' = < x'x^x'' > (skewness matrix), (2) 
j^ijki ^ ^ x^x^x^x^ > -c^-'c^' - c^'^c'' - c*V^ (kurtosis matrix) (3) 

(we will sometimes use the words "skewness" and "kurtosis" to refer to the 3- and 4- 
point correlation functions, respectively, or to their overall amplitude; in the statistical 
literature, the definition of skewness is actually, in our notation, 71 = A;"Y(c**)^/^, 
and for the kurtosis 72 = /c*''y(c*')^). The correlation matrices depend in general 
both on a number of theoretical parameters a^, j = 1,..P (that we leave for the 
moment unspecified) and on the experimental errors. In most cases, we can assume 
the experimental errors to be Gaussian distributed (or even uncorrelated) so that they 
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can be completely characterized by the correlation matrix e*^, which is simply to be 
added in quadrature to the 2-point correlation function. It is useful to define then the 
matrix 

^ {d^ + e'^YK (4) 

The problem of estimating the parameters aj is solved by maximizing, with respect to 
the parameters, the likelihood function 

L = /(x) , (5) 

where /(x) is the multivariate probability distribution function (PDF) for the random 
variables xi. Clearly, knowing the LF one can, at least in principle, determine also the 
parameter CR, as will be discussed in the next sections. The main difficulty to this 
approach, however, is that we do not know, in general, the exact form for the PDF 
/(x). The usual simplifying assumption is then that /(x) is a multivariate Gaussian 
distribution 

L, = /(x) = ^(x. A) = (27r)-^/^|A|V2exp(-lx^A,,a;^) . (6) 

where |A| = det(Aij). This is usually assumed, for instance, in analysing the CMB 
fluctuation maps and the cosmic velocity fields. A straightforward way to general- 
ize the LF so as to include the higher-order correlation functions, which embody the 

non- Gaussian properties of the data, is provided by the multivariate Edgeworth expan- 
sion (MEE). An unknown PDF /(x) can indeed be expanded around a multivariate 
Gaussian G{x, A) according to the formula (Chambers 1967; McCullagh 1984; Kendall, 
Stuart & Ord 1987) 

/(x) = G{^,\)[l+''-k'^%^k{^,\) + ^^k'^^%^^^^^^ , (7) 

where hij,, are Hermite tensors, the multivariate generalizations of the Hermite poly- 
nomial. If there are r subscripts, the Hermite tensor /iy.. is said to be of order r, and 
is given by 

/i,,-... = (-l)'-G'-i(x, A)%..G'(x, A) , (8) 

where dij,,, = {d / dxi){d / dxj).... The Hermite polynomials are located on the main 
diagonal of the Hermite tensors, when Ajj = 5ij. Notice that the function /(x) is nor- 
malized to unity, since the integrals of all the higher order terms from minus to plus 
infinity vanish. It can be shown that the MEE gives a good approximation to any dis- 
tribution function provided that all the moments are defined and that the higher order 
correlation functions do not dominate over the Gaussian term. In other words, the 
MEE can be applied only in the limit of mild non-Gaussianity. More accurately, the 
approximation is good, in the sense that the error one makes in the truncation is smaller 
than the terms included, if the cumulants obey the same order-of-magnitude scaling 
of a standardized mean (Chambers 1967). This condition is satisfied, for instance, 
by the cumulants of the galaxy clustering in the scaling regime, which explains why 
the (univariate) Edgeworth expansion well approximates the probability distribution 
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of the large scale density field (Juszkiewicz et al. 1994, Kofman & Bernardeau 1994). 
The same expansion has been also applied to the statistics of pencil-beam surveys, 
in which the one-dimensional power spectrum coefficients can be written as a genuine 
standardized mean (Amendola 1994). Finally, it has also been used to go beyond the 
Gaussian approximation in calculating the topological genus of weakly non-Gaussian 
fields (Matsubara 1994). Let us also note that the MEE lends itself to a further gener- 
alization: if the experimental errors are not Gaussian distributed, then the expansion 
for the data given the error correlation functions e*-' ■ is the same as in Eq. (|^), but 
with the new cumulants K"^^" = k^^" + e*-^". In fact, let x] be the theoretical values 
whose measure is given by the data d\ and let = (i* — x] be the experimental error. 
The theoretical values are random variables in the sense that the theory usually pre- 
dicts only their distribution, not their definite value. For instance, once the monopole 
is subtracted, the standard cosmological models predict CMB fluctuations Gaussian 
distributed with zero mean, = 0. Then we are concerned with the distribution of 
X* = + {d"^ —xl) = xl + ^^, the sum of the theoretical values xl and of the experimental 
errors both of which are random variables. If the two are independent, the general 
theorems on the random variables ensure that the cumulants cumulate, i.e. that the 
cumulants of x^ are the sum of the ones of xl and of 

Two properties are of great help in dealing with the MEE. The first is that k^^^--- and 
hijk... are contra- and co-variant tensors, respectively, with respect to linear transfor- 
mations of the variables Xj. It follows then that /(x)(ix is totally invariant with respect 
to the linear transformations which leave invariant the quadratic form = x^XijxK 
This property is very useful, because we can always diagonalize the quadratic form by 
choosing a linear combination = Ajx"^ such that = x^^ijx^ = y^SijV'^ ■ The MEE in 
the new variables ?/* remains formally the same as in Eq. (^, with x — > ?/ and X ^ 6, 
but now G(y, S) factorizes, and all the calculations are simplified. Notice that even if 
the new variables are uncorrelated, they are not statistically independent, since they 
are not (in general) Gaussian variates. The higher-order matrices are then not diag- 
onalized. In the following we will often assume that the variable transformation has 
been already performed, so that we will write y and 6 instead of x and A, leaving all the 
other symbols unchanged. The second useful property is that the MEE is analytically 
integrable if the integration region is bounded by = const. This property will be 
exploited in Sect. 4. 



3 Best estimates and asymptotic confidence re- 
gions 

The likelihood estimates for the parameters are to be obtained by maximizing Eq. 
(0) with respect to the parameters. To illustrate some interesting points, let us put 
ourselves in the simplest case, in which all data are independent, and we only need to 
estimate the parameters a and k^ entering the 2- and 3-point correlation function as 
overall amplitudes: 

Cij = a'^5ij, kijk = ksSijSjk . (9) 
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Because we are in such a simplified case, we will recover several well-known formulae 
of sampling statistics, like the variance of the standard deviation a and of k^. It is 
important to bear in mind, however, that the MEE is much more general than we 
are assuming in this section, since it can allow for full correlations among data, for 
experimental errors, and for non-linear parametric dependence. 

For simplicity, we also assume that the sample kurtosis is negligible. Because of 
this, we can put the fourth order sample cumulant of the dataset to zero (see, e.g., 
Kendall, Stuart & Ord 1987): 

= (iV-i)(A.-2)(iV-3) «^ +l)Y,4-HN-l)(£,4f]= 0, (10) 



so that we have, for large iV, 



Y.< = ^iY.^f- (11) 



We show here that the maximum likelihood estimators for the variance and for the 
skewness in the case of independent data and for — >• cxo reduce to the usual sample 
quantities 



S-2 



a 



Y.^V{N-l), (12) 

i 

^ (13) 



{N-l){N-2) 

We will assume also that the average has been subtracted from the data, i.e. that 
J^i^i = 0. This actually reduces the degrees of freedom, but in the limit of large N 
we can safely ignore this problem. If the distribution function of Xi is approximated in 
the limit of small k^ by the univariate Edgeworth expansion 

/, = G(x„ a)[l + ksh./Q + klK,n2] , (14) 

where G{xi, a) is a Gaussian function, then the multivariate distribution function for 
the dataset is 

^(x)=n/- (15) 

i 

[In the notation of Eq. (|^), h^i = hm and h^i = By the definition in (^ we have 

hsi = a-^[x^ - 3a^Xi] . (16) 

Let us pause to evaluate the order-of-magnitude of the non-G corrections in the uni- 
variate Edgeworth expansion (|I^) . Assuming Xi ~ a, the first correction term is of 
the order of 71 = k^/a^, which is the dimensionless definition of skewness. The gen- 
eral rough requirement for the truncated Edgeworth expansion is then that 71 <^ 1. 
This condition will be encountered several times throughout this work. The maximum 
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likelihood estimators for a and are then the values a, which maximize L, or, 
equivalently, its logarithm logL. We have then the equations 



~ ^ da ~ ^ 



da 
and 



a- 



(17) 



dlogL 



E 



d\ogfi 



E 



, h, 

"-fij- 

6 36 ® 



dh i dks 
To first order in k^, the latter equation gives 



k^h^i k 
6 72 



3/, 



E 



/iei 

6 36 



^3,2 



36 



3j 



SO that our estimator is 



A;. 



6Ei^ 



3j 



Suppose now that the solution for a of Eqs. ([l^) and 
(H), with N 



(19) 



(20) 



I) is the usual variance estimator 
1. Then we can observe that, from dTT 



E = E ^ 



3a-'\3 E - 12a^ E + 5 E = E ^. 

|l^). Inserting ( pi]) in 



where in the last step we used Eq. ( pAf ) and Eq, 
finally (assuming that J2i = 0) 



k. 



Na- 



N 



QNa-^ , (21) 
(pO|) we obtain 

(22) 

and inserting 



which coincides with ([T3|) for large A^. Finally, going back to Eq. (|]J 
ks = ks we recover the sample variance = Y.i ^l/^y so that our proof is complete. If 
needed, the small bias introduced by a finite N can be easily removed just multiplying 
k^, a derived from the likelihood method by suitable functions of N. 

The same calculation can be carried out in the more general case of dipendent 
variables, but the search for the maximum is more simply performed numerically when 
the situation is more complicated (e.g., because of the presence of experimental errors, 
or of more parameters, or more complicate parameter dependence). We just quote 
the result when only an overall skewness parameter is required, as when the 3-point 
correlation function is given by kijk = k^Sijk, and the tensor Sijk is known (see Section 
5). The best estimate for k^ is then 



k. = -6 



ijk 



gijk glmnj^ 



ijklmn 



which reduces to the expression above when Sijk = SijSjk, using the relation 



E ^iiijjj — E + (E ^3j)^ ~ E ^ 



3i ' 



(23) 



(24) 
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and observing that {J2 h^Y is of order and thus neghgible. 

Once we have the best estimators Q;j(x) of our parameters, we need to estimate 
the confidence regions for that paramaters, i.e. the range of values in which we expect 
to find our estimators to a certain probabihty, given that the data distribution is 
approximated by the MEE. The problem consists in determining the behavior of the 
unknown distribution P[dj(x)], when we know the distribution for the random variables 
Xi. This problem is generally unsoluble analitycally, and the common approach is to 
resort to MonteCarlo simulations of the data. However, we can always approximate 
P(dj) around its peak by a Gaussian distribution multivariate in the parameter space] 
if the number of data oo, this procedure can be justified by the central limit 

theorem. For instance, if = J^^i/^y then its distribution will tend to a Gaussian 
whatever the distribution of the data Xi is, in the limit of large N. In more general 
cases (e.g. correlated data) the central limit theorem does not guarantee the asymptotic 
Gaussianity; we can expect however it to be a first reasonable approximation far from 
the tails. If this approximation is adopted, then it can be shown (see e.g. Kendall, 
Stuart & Ord 1987) that the covariance matrix of the parameters can be written as 



i_ (91ogL(x, «„) 



''ab 



daadah 



(25) 



where a, h run over the dimensionality P of the parameter space. The la confidence 
region is then enclosed inside the P-dimensional ellipses with principal axis equal to 
Ay^, where Xa are the eigenvalues of Tiah- Let us illustrate this in the same simplified 
case as above: N independent data characterized by variance ai = a and skewness 
a2 = k^. To further simplify, we assume that the mixed components S12 = S21 can be 
neglected (see below). The component S22 is then easily calculated as 

'^22 ~ " ^ hiiijjj /36 , (26) 

Thus, using Eq. (p^), the variance of k^ turns out to be (dropping the hats here and 
below) 

E22 = 6orVAr, (27) 

which, not unexpectedly, is the sample skewness variance, i.e. the scatter in the skew- 
ness of Gaussian samples (for the dimensionless skewness defined as 71 = k^/a^ the 
variance is 6/N). In other words, to this order of approximation, the variance in the 
sample skewness in non-G data equals the variance in the sample skewness of Gaussian 
distributed data. The generalization to dependent data is 

S2-2' = -s'^^s'^^^h.^klnJ^G (28) 

which gives then the variance of the estimator k^ in the general case. 

More interesting is the error in the variance parameter a when not only a non- 
zero skewness k^ is present, but also a non-zero kurtosis parameter ^4, defined in a 
way similar to k^ as kijki = k^Sijki- Then the result turns out to be, in the same 
approximations as above, 

2 

Sii = 1^ [1 + 72/2] , (29) 
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where 72 = k^ja^ is the dimensionless kurtosis. The mixed components amount to 
Yjy2 = = —Nh^/o'^. Then we see that in the determinant of Sat we have [S^g^]^ <^ 
[S]3^'^S22^] for fcs/o"^ ^ 1, which is again the mild non-Gaussianity condition we are 
assuming throughout this work. 

Eq. (^) is again an expected resuhs: it is infact the variance of the standard devi- 
ation (J for independent data when a non-zero fourth-order moment is included. The 
first term in (^) is the usual variance of the sample variance for Gaussian, independent 
data. The second term is due to the kurtosis correction: it will broaden the CR for a 
when is positive, and will shrink it when it is negative. Depending on the relative 
amplitude of the higher-order corrections, the CR for the variance can extend or re- 
duce. It is important however to remark that this estimate of the confidence regions 
is approximated, and that it can be trusted only around the peak of the likelihood 
function. This means that we cannot use the CR estimated by the method exposed 
here when we are interested in large deviations from the best estimates. The true CR 
will in general be more and more different from this simple estimate as its probability 
content grows: to a confidence level of, say, 99.7%, we cannot reliably associate a CR 

1 /2 

of 3Ejf ', as we would do were the a perfect Gaussian distribution. 

This limitation is the main motivation for the rest of this work. Adopting a 
technique, we will be able to give an analytical expression for the CR of the variance 
(or of other parameters entering x^) when non-G corrections are present. This is useful 
whenever we actually measured non-vanishing higher-order cumulants and wish to 
quote a CR for the variance allowing for the non-Gaussianity, or when, more generally, 
we have reason to suspect that our data are non-Gaussian and we wish to investigate 
how the CR vary with different non-Gaussian assumptions. As already remarked, non- 
Gaussianity can also be invoked to put in agreement two experimental results reporting 
non-overlapping CR. The results of the next sections will confirm the approximate 
trend of Eq. (P^D, as long as the CR does not extend into the tails of the paramater 
distribution. 

4 Non-Gaussian method 

If our data are distributed following the MEE, then we can measure the likelihood to 
have found our actual dataset integrating the LP over all the possible outcomes of our 
experiment. According to the method, the actual dataset is more likely to have 
occurred, given our assumptions, the higher is the probability to obtain values of 
larger than the measured Xo- Then the relevant integral we have to deal with is 



where the region of integration extends over all the possible data values which lie 
inside the region delimited by the actual value Xo- The function M{xo) gives then 
the probability of occurrence of a value of x^ smaller than the one actually measured. 
We can then use M{xq) for evaluating a CR for the parameters which enter Xq, like 
the quadrupole and the primordial slope in the case of CMB. The CR will depend 




(30) 
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parametrically on the higher-order moments; however, this will not provide a CR for 
the higher-order moments themselves. The method of the previous section can always 
be employed to yield a first approximation for such moments. Both too high and 
too small values of Xo have to be rejected; fixing a confidence level of 1 — e, we will 
consider acceptables the values of the parameters for which M{xo) is larger than e/2 
and smaller than 1 — e/2. Notice that the theoretical parameters enter M{xo) both 
through the integrand L{x, A) and the integration region Xo- This section is devoted 
to the evaluation of (0). 

Let us split the LF into a Gaussian part and a non-G correction, 

L = + . (31) 

The integral of the Gaussian part is the standard one, and it is easily done: 

2 

fGix,X)l[dxi= fG{y,6)l[dyi= ^ PN{x')dx' = F^ixo) , (32) 

where Pn{x'^) is the PDF with N degrees of freedom, and F/v its cumulative integral. 
Notice that the integral has been performed over a compact region in x-space whose 
boundary is = Xo- It is then convenient to change variables in the integral, from y^ 
to the hyperradius and — 1 angles 

Jlldx.^lJ {x'Y~'d{x')dn = ^J {x'r'd{x') , (33) 

i 

where p = N/2 and = 27r^/r(p) (the surface area of a unitary (A^ — l)-sphere). 
The same procedure will be applied several times in the following. 

For the non-G sector we have to consider separately the three last terms in the 
MEE (^. However, since it will be shown shortly that the term linear in the skewness 
does not contribute to the final result, we will focus only on the two last terms in the 
MEE. Let us consider the term in fc*-''^'. Its integral can be written as 

K = — — — J G{x, X)hijki{x, X)Y{dxi 

i 

= '^^^ J G{y,6)h,,M{y,S)l[dyi = J didjd,dil[G.dy, , (34) 

i i 

where Gi = (27r)~^/^ exp(— 1/^^/2) and where the kurtosis tensor /c*-''^' in the first line 
is to be calculated with respect to x*, while in the last line with respect to y*. Since 
y^ = AjjX", then 

k'^'\y) = A^^AiA'^A^.k-'-'ix) , (35) 

and likewise for /c*-''^. Suppose now one of the subscripts, say i, appears an odd number 
of times, like in fc*"-'. Let us call then i an odd index. The integral K will then be odd 
in yi. Since the integration region is symmetric around the origin, K would vanish. 
This shows that any term in K containing odd indexes of fc*-''^' must vanish. This 
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explains also why the skewness term hijk-, which always contains some odd index, gives 
no contribution to the likelihood integral. The only non-zero terms in K are then of 
the type y^^^ and y^^^ (the index order is irrelevant). We then need only two kinds 
of integrals for as concerns K. Let us evaluate the first kind: 

/i = / n G^dy,[j dy,dykdplG,Gk] . (36) 

The inner integral must be evaluated inside the circle bounded by = Xo~x'jk^ where 
Xjk = J2{i^j,k) Hi - Transforming the variables {yj, yu) to the radius pjk and the angle 6*, 
and integrating over the new variables, we obtain 

^1 = / ( n Gidy,)G2{pjk)fi{pjk) = GNixo) J fiipjk) Yi ^ (^7) 

i^j,k i^j,k 

where we define G'Ar(xo) = (27r)~^/^ exp(— Xq/2), and where 

Mp,k)=A^P%-p%)/^. (38) 

Changing again variables under the integral to the hyperradius x'jk ^^nd — 3 angles, 
we obtain 

h = lGNixo)A2p, fiiPjk)ix%r-'dx% = qiiXo)GNiXo) , (39) 
where p2 = {N — 2)/2, and where 

qiiXo) = I^'^^'X^IN + 2 - xl]/r{2 + N/2) . (40) 

All the other integrals we need can be obtained in similar ways. For instance, the 
second kind of non- vanishing integral in K is 

n Gdy^[J dy.dp,] = ^-G^{xo)A2p, f2{Pi){x]r-'dx] = g2(xo)G'^(xo) , 

(41) 



where pi = (A^ — l)/2 and 



g2(Xo) = ^vr^/\o^ [at + 2 - xl] /r(2 + Ar/2) . (42) 

For as concerns the last term in (^, we need only to evaluate three new integrals, from 
the terms hjjkkii, hjjjjkk and hj,,,j. Let us denote these integrals by and J5. In 

complete analogy to the two integrals /i,/2, we find Jj = Giy{xo)(li{Xo) where 

qsiXo) = ^vr^/'x^/^(Xo) , 
q.iXo) = ^vr^/^X^(Xo) , 

q^iXo) = f vr^/2x^/i(xo) , (43) 
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and where 



+ 



Xo 



r (1 + iv/2) r (2 + iv/2) r (3 + n/2) 



(44) 



These expressions are all what we need for the complete evaluation of the likelihood 
integral. The general result is then 



M(xo) = j LX{dxi = FN{xo 



2r(2 + N/2) 



a(iv + 2-xo) + a(^- 



, (45) 



where Ca = ci + 3c2, and Ch = + 3c4 + 15c5, and the coefficients q are formed by 
summing over all the even diagonals of the correlation tensors k^^" and multiplying 
for the Edgeworth coefficients (1/24) for ci,C2 and (1/72) for 03,04 and C5. Let us 
denote with Tra;,..(A;*-' - ) the sum over all the disjoint partitions of a-plets, 6-plets, etc. 
of equal indexes: e.g., 1i22{k^^^^) means summing over all terms like fi;^^^^ ^ f^tkik ^ j^ikki 
but without including A;****; or, Tr24(A;*-''^A;''"") means summing over terms like k^^^k^^^ 
or k^^^y^^. Then, the coefficients Cj in Eq. (|45|) can be written as 



Cl = (l/24)Tr22(fc^^'='), 
C3 = (l/72)Tr222(A:*''A;""") 
C5 = (l/72)Tr6(F^'^fc''™"). 



C2 



C4 



(l/24)Tr4(F^'=') , 

= (l/72)Tr24(A;*^'''A;'"'') 



(46) 



Let us make some comments on the result so far obtained. First, notice that M{xo) 
is a cumulative function and as such it has to be a monotonically increasing function 
of its argument bounded by zero and unity. This provides a simple way to check the 
consistency of our assumptions: when the higher-order moments are too large, the 
MEE breaks down, M{xg) is no longer monotonic, and can decrease below zero or 
above unity. Second, let us suppose that the higher-order correlation functions are 
positive, which is the case for the galaxy clustering (see Section 6). Then the non-G 
corrections in Eq. (|45|) are negative for Xq ^ N. The value of Xo(^); corresponding 
to a probability content M{xo) = 1 — is a measure of how large is the confidence 
region associated with the threshold e, if xo is single-valued on the parameter space, a 
common occurrence in practice. The fact that the corrections are negative for Xo ^ ^ 
implies that the value of xo = Xoi.^) is larger than in the purely Gaussian case, in the 
limit of e ^ 0. Consequently, if the higher-order correlation functions are positive, the 
confidence regions are systematically widened when the non-Gaussian corrections are 
taken into account. For e not very close to zero is not possible to make such a definite 
statement; the regions of confidence will widen or narrow depending on the value of 
the moments, as will be graphically shown in the next section. Let us remember that 
for two-tail tests the CR is enclosed by the upper hmit Xo^''(^/2) and the lower limit 

/ox 

Xo (1 "~^/2), and the limits behave differently depending on e and on the higher-order 
moments. Finally, it is easy to write down the result in the particular case in which 
all the cumulant matrices are diagonal, i.e. for statistically independent variables. 
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In this case the variables are simply equal to xYcTj, if = (A*)^^/^, and we can 
put A;***(?/) = fc*"(x)/cr^ = 7i^j, and likewise = 72,^ (skewness and kurtosis 

coefficients). Then, we have 01 = 03 = 04 = 0, and Eq. ( ^5]) can be simplified to 



M(xo) = Fm{xo) + G'7v(xo)g(xo) 



(47) 



where 



(iV + 2)r(A^/2) [24 



(iV + 2)-X^]+|7? 



-(iV + 2) + 2x^ 



A^ + 4 



(48) 

and where we introduced the average squared skewness, 7^ = I] 7i j/A^, and the average 
kurtosis, 72 = Y.l2,i/N. 



5 Graphical examples 

This section is devoted to illustrate graphically some properties of the function M(xo) 
in its simplified version ( ^7D above, first putting 71 = 0, then 72 = 0, and assuming 
= 10 and = 100. In all this section we can think of Xo as depending monotonically 
on one single parameter, for instance the overall normalization A > of the correlation 
function: Xo(^) — x''x^{Acij + eij)~^. Then it appears that Xo decreases from a finite, 
positive value to zero as A goes from zero to infinity. The range in which A is bounded 
increases or decreases with the bounding range of Xo', "we can then speak of a CR 
on Xo meaning in fact the corresponding CR on the parameter A. In the general 
case, the relation between xo and its parameters can be quite more complicated. In 
Fig. la (for 71 = and N = 10), we show how the function M{xo) varies with 
respect to the non-Gaussian parameter 72. Schematically, for Xo/-^ > 1' function 
M(xo) decreases when 72 > and increases in the opposite case. As anticipated, for 
too large a 72, M{xo) develops a non-monotonic behavior. The consequence of the 
behavior of M{xo) on the confidence region of xo is represented in Fig. lb, where 
the contour plots of the surface M(xo;72) are shown. Consider for instance the two 
outer contours, corresponding to M = .01, the leftmost, and M = .99, the rightmost. 
The important point is that the range of Xo inside such confidence levels increases for 
increasing 72; with respect to the Gaussian case, 72 = 0, the acceptable region for xo 
widens substantially even for a small non-Gaussianity. As a consequence, a value as 
high as, say, Xo/^ = 2.5, is inside the 99% confidence level if 72 > .2. The same is 
true for the other contour levels, although with a less remarkable trend. This behavior 
confirms the approximate result of Eq. (^9]) . As anticipated, this means that the 
non-G confidence regions will be larger and larger (if the higher moments are positive) 
than the corresponding Gaussian regions for higher and higher probability thresholds. 
Notice that in this case the parameter 72 itself cannot be given a CR, since as we 
already noticed the LF has no maximum when varied with respect to it. We can use 72 
only as an external parameter, either provided by theory, or estimated from the data 
in some other way. Fig. 2a, b reports the same features for 71 = 0, A^ = 100. Now the 
confidence regions are much narrower, because of the increased number of experimental 
data. 
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The situation is qualitatively different considering 72 = and varying 71, the av- 
erage skewness. Fig. 3 and Fig. 4 are the plots for this case {N = 10 and = 100, 
respectively). The contour levels are obviously symmetric for ±71. Now for any given 
Xo there is a CR for 71 and viceversa, so that a bound can be given on each parameter 
given the other one, although the joint CR for both parameters is infinite. Now one 
can see two different features in the contour plots. For the outer contours, delimiting 
levels of 1% on both tails, the CR of xo increases for larger |7i|, with a minimum for 
the Gaussian case. For the internal contours, however, the CR actually shrinks for 
larger |7i|, being maximal at the Gaussian point. It is clear that in the general case, 
7i)72 7^ 0, the topography of the LF can be quite complicated. 

6 Comments on practical application 

The results of the previous sections can be employed to estimate theoretical parameters 
and confidence regions in several interesting cases. We consider here two of these, the 
large scale structure (LSS) of galaxies and the CMB. 

In the case of LSS surveys, the data usually consist of the fluctuations = 6n^/n 
in the number counts of galaxies in the i-th cell in which the survey is partitioned. (We 
assume here for simplicity that the average density n is fixed a priori. Otherwise, we 
can include it in the set of parameters to be estimated.) The main problem in applying 
the formalism developed so far to real situations is to choose a "good" set of theoretical 
parameters aj. In principle we can parametrize the statistical properties of the LSS in 
an infinite number of ways. However, the particular set of parameters we are going to 
adopt has been singled out in the current literature, both theoretical and observational, 
with very few exceptions. Assuming for the correlation function the power-law form 
^(r) = {ro/ry, the cell-averaged c^^ is given by the following expression 

c'' = J ari2)WRXri)Wn^{r2)dr,dr2 , (49) 

where = |ri— r2|, and Wr. [Wr.) is the normalized window function of characteristic 
size Ri (Rj) relative to the i-th (j-th) cell. If the cells i,j are fully characterized by a 
size R and a separation Sij, the integral (|49| ) can be written as 

= J{^,R/s,,){R/ro)-\ (50) 

where J(7, R/sij) is a dimensionless function of 7 and R/sij. Following standard work 
(e.g. Peebles 1980) we will then write for the higher-order correlation functions the 
following expressions 

k'^^^ = i?, ^ cV^c'^' + ^ c'^c* V , (51) 
2 3 

where J22 (Z^s) means summing over all the 12 (4) tree graphs with at most two (three) 
connecting lines per vertex (i.e. summing over topologically equivalent graph config- 
urations). Note that we define Q,Ra and Rb in terms of the cell-averaged correlation 
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functions, rather than in terms of C,{r), as currently done. Our definition has the 
advantage that from Q, Ra and Rb one can obtain directly the often quoted scaling 
coefficients 5*3 = 3Q and 5*4 = 12Ra + 4:Rb, without complicated integrals over the 
window functions. The drawback is that our Q, Ra, Rb cannot be compared directly to 
the values reported in literature, albeit the difference should be very small. 

Several analysis of large scale surveys show that Q, Ra, Rb are fairly constant over 
several scales, and of order unity. On scales larger that ~ 10h~^ Mpc, however, 
the power-law form of is not longer acceptable. For such scales is preferable to 
parametrize instead the power spectrum P{k) and to use the identity 



where Wk is the Fourier transforms of the window function. Various forms of P{k) 
have been proposed so far. For instance, one can assume the simple functional form 
proposed by Peacock (1991), with its two scale parameters ko,ki, or the CDM-like 
form of Efstathiou, Bond & White (1992), involving an overall normalization and a 
dimensionless parameter F. To give an idea of how big the non-G corrections are, let 
us assume to have N independent data (i.e., data on cells at separations much larger 
than the correlation length) and let us estimate the parameters 71,72 of Eq. (0). 
Since 71,1,72,4 and ai are the same for all the data, one has (dropping the subscript 
i) 72 = A;****(x)/(t'^ = S^a"^ and 'jf = S^a"^. For ^ 3 and ^4 ^ 20, as large scale 
surveys suggest, one gets 72 ~ 20a^, and ~ lOa^. For scales around 10 Mpc or 
so, where cr^ ~ 1, 71, 72 are then very large, but they decrease rapidly for larger scales. 
On scales larger than 30 Mpc or so, 71,72 are small enough to use the MEE also 
near the tails. 

The non-G LF allows a determination of the parametric set in such a way that 
the best estimate of one parameter depends on all the other ones, unlike the common 
procedure of estimating one parameter fixing the others (in particular, fixing the non- 
Gaussian parameters to zero). For instance, the usual way of estimating ro,7 is to 
find the best power-law fit to the observed correlation function, which amounts 
to assume a Gaussian distribution around the mean values. Both the estimate and 
the confidence region would then be corrected by the higher order terms. However, 
as already mentioned, we can use the MEE for estimating the higher-order moments 
themselves only if enough terms have been included in the expansion. The reason is 
clear by looking at the Eq. (|^: at this order of truncation, the expansion is linear in the 
fourth order moment, and as a consequence it has no maximum when derivated with 
respect to, e.g., Ra or Rb. The best estimate does not exist at all. We can give however 
an estimate for Q, and we can expect it to be a good estimate as long as it is in the 
regime in which the MEE holds. A simple way to check this is to see whether for that 
value of Q the function M{xo) is well-behaved, i.e. is a monotonic increasing function 




(52) 



from which, using Eq. (|49|), 




(53) 
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bounded by zero and unity. In principle, one can proceed further, including more and 
more terms in the LF, so that one can reach not only a higher degree of approximation, 
but also estimate the error introduced by the truncation itself. Needless to say, these 
goods come at the price of a factorial increase in algebraic complication. 

Once we have chosen our parameter set, the only remaining difficulty is to evaluate 
the coefficients ci,..C5. Let us remark that the Eqs. (|5lD are vahd with respect to 
the original data x*, while we need the correlation functions for = A'jX^ to evaluate 
Ci, ..C5. The relation between the two sets of correlation functions is provided by Eq. 
(^). The evaluation of ci, ..C5 is straightforward. One needs simply to scan all the 
possible combinations of indexes k, .. and sum only those tensor components with 
all equal indexes (for C2 and C5), or those with all paired indexes (for Ci and C3), or 
finally those with a (2,4) index structure (for C4). A more explicit expression for the 
coefficients can also be found in specific cases (e.g. exploiting the symmetry under 
index permutation of the tensors in (|51|)), but the general calculation can be coded so 
easily on computers that we prefer to leave it in the form (^Bj). Let us then summarize 
the steps needed to analyze a given set of data. First, one selects a value for the chosen 
parameter set inside a plausible range. Second, one diagonalizes, for that particular 
parameter set, the quadratic form x^Xijx^ so to determine the matrix A* such that 

= A^jX^ . Third, one evaluates the five coefficients ci, ..C5 summing over all the 
required tensor components. Fourth, one evaluates L and M{xo) for the selected 
parameter set. Fifth, one repeates the four previous steps spanning a reasonable range 
in the parameter space. Finally, the values for which L has a maximum inside the 
range, if any, are the best estimate of the set of parameters, while the region for which 
e/2 < M(xo) < 1 — £^/2 defines their joint confidence region. 

For the CMB, the procedure is very similar. The major difference is the set of 
parameters we are interested in. For simplicity, let us consider an experiment like 
COBE, in which the large angular beam size is mainly designed to study the Sachs- 
Wolfe effect of primordial fluctuations. The two-point angular correlation function can 
be conveniently written as 



where aij is the angular separation between the i-th and j-th pixel on the sky, Wi{l3) is 
the observational window function relative to a beam angular size f5, Pi is the Legendre 
polynomial of order /, and Ci is defined in terms of the multipole coefficients as 



For the Sachs- Wolfe effect of fluctuations with power spectrum P = Ak^ we can derive 
the expected variance of the amplitudes (e.g. Kolb & Turner 1989) 



00 




(54) 



1=2 




(55) 



af =< la: 



>= 




(56) 



5 r[(3 + n)/2]r[/ + (5-n)/2] 
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where Q^ms is the expected quadrupole signal derived from the correlation function. 
The theoretical value for Ci is then Ci = {21 + l)crf , and it depends uniquely on 
and n. Finally, we rewrite Eq. (|5^ ) as 

oo 

c'^ = J2i2l + l)afWi{P)Piicosa,,) . (57) 

1=2 

The correlation function for the Sachs- Wolfe temperature fluctuations is then parametrized 
by Qrms The situation for the higher-order correlation functions is much less 

well established. Non-Gaussianity in the CMB is predicted by several models, like 
topological defect theories, or non-standard inflation, or can be induced by some kind 
of foreground contamination. There is not, however, a single, widely accepted way to 
parametrize non-Gaussianity in this context (see e.g. Luo & Schramm 1993 for some 
possible alternatives). A very simple possibility is to assume for the CMB n-point 
correlation functions the same kind of scaling observed in LSS. Preliminary constraints 
on the 3-point parameter from the CORE data have already been published (Hinshaw 
et al. 1994). Some model of inflation predicts indeed this sort of scaling, although 
the expected amplitude of the non-Gaussian signal in standard models is far below 
observability (Falk, Rangarajan & Srednicki 1993; Gangui et al. 1994). For small 
scale experiments, the parametrization is different, and often a Gaussian shape 
= Co exp(— a^j/2a^), is assumed. The formalism here presented can be clearly ap- 
plied to any desired form of the correlation function. 



7 Conclusions 

Let us summarize the results reported here. This work is aimed at presenting a new 
analytic formalism for parametric estimation with the maximum likelihood method 
for non-Gaussian random fields. The method can be applied to a large class of as- 
trophysical problems. The non-Gaussian likelihood function allows the determination 
of a full set of parameters and their joint confidence region, without arbitrarily fixing 
some of them, as long as enough non-linear terms are included in the expansion. The 
CR for all the relevant parameters can be estimated by approximating the distribution 
function for the parameter estimators around its peak by a Gaussian, as in Sect. 3. 
To overcome this level of approximation, in Sect. 4 we generalized the method 
to include non-Gaussian corrections. The most interesting result is then that the CR 
for the parameters which enter Xo is systematically widened by the inclusion of the 
non-Gaussian terms, in the limit of 5 — >^ 0. Two experiments producing incompatible 
results can then be brought to agreement when third and fourth-order cumulants are 
introduced. In the more general case, the CR may extend or reduce. 

While we leave the analysis of real data to subsequent work, we displayed some 
preliminary comments on the application to two important cases, large scale structure 
and cosmic microwave background. 

There are two main limitations to the method. One is that we obviously have 
to truncate the MEE to some order, and consequently the data analysis implicitly 
assumes that all the higher moments vanish. The second limitation is that the method 
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is not applicable to strongly non-Gaussian field, where the MEE breaks down. This 
can be seen directly from Eq. (^): for arbitrarily large constants C\ — C5 the likelihood 
integral is not positive-definite, although always converge to unity. Assuming the 
scaling relation of Eq. (|5l|), for instance, the condition < 1 will ensure that the 
higher order terms are not dominating over the lower terms, as long as the scaling 
constants are of order unity. Basing upon the current understanding of the matter 
clustering, we expect the condition of weak non-Gaussianity to hold for scales ranging 
from ~ 30h~^ Mpc to the horizon scale. 
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Figure caption 



Fig. la) Plot of M(xo) as a function oixl/N and of the dimensionless kurtosis 72, 
for ^yi — 0, N — 10. For 72 = we return to the usual cumulative function. Notice 
how for large kurtosis 72 the cumulative function M{xo) develops minima and maxima, 
indicating that the MEE is breaking down, b ) Contour levels of M{xo) corresponding to 
M — .01, .1, .2, .3, .7, .8, .9, .01, from left to right. Notice how the limits for xo broaden 
for increasing 72. 

Fig. 2 a) Same as in Fig. la, now with more data, N — 100. b) Contour levels of 
M(xo) for the same values as in Fig. lb. The CR is now much smaller than previously. 



Fig. 3 a) Same as in Fig. la, now with 72 = 0, = 10, and varying 71. b) 
Contour levels of M{xo) for the same values as in Fig. lb. 

Fig. 4 a) Same as in Fig. la, now with 72 = 0, N = 100, and varying 71. b) 
Contour levels of M(xo) for the same values as in Fig. lb. 
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